Preface

In this document, we’ll go through the many many different steps I’ve taken in an attempt to analyze several years of waterbird surveys at Estero La Cruz in Bahía de Kino, Sonora. Rather than only include the final model and results, I started this document when I began working with the data– so this document has evolved and grown alongside me as I worked through the various problems/errors/roadbloacks I encountered. While initally working with the data, I tried to use multiple variables to take advantage of the various data we collect at each survey. However, I’ve learned a lot about modeling and ultimately found myself following the advice to keep it all as simple as possible. By the end of this document, you’ll see that a lot of time and effort has been put into this to address a simple question- Has the number of shorebirds at Estero La Cruz changed over time?

Los Datos

The first steps are loading all required packages and of course—the data. This may seem like a long list of packages, but I’ve only added in the important ones that I know are being applied. Any additional packages that those on this list are dependent upon will also be loaded.

All the necessary files should be saved in the working directory. The files we’ll Counts of birds in each guild for each survey at Estero La Cruz from 2013/14 to present cycle and these are “environment variables” taken at each survey. Similar to the biodiversity

Here’s a sneak peak of the data–as you can see it’s set up with each survey effort as a row with every guild as a column.

Preview of Guild Totals
Date shore gulls_terns pel_corm land
2013-09-12 867 315 286 12
2013-09-27 2129 1996 363 4
2013-10-11 793 587 506 6
2013-10-25 1243 1508 459 1
2013-11-07 2704 1473 4527 6

Next, we’ll do a bit of cleaning of the data. To up the environmental variables, we’ll parse out the “tide” variable into “tide_height” and “tide_dir”. This pipe code comes from the code Abram sent me.

Preliminary Plots

To start, I made some boxplots to at each of the guilds, grouped by field season (temporada) and season (estación):

Also, some background- I have divided the seasons as 1-Fall 2-Winter 3-Spring—these seasons were decided as how to split the data because of migration. Someone had suggested maybe grouping fall and spring (again because migration) and later I end up doing it. In my efforts of working/testing/exploring models, it became apparent that springs and fall were pretty consistently similar enough that later, I did end of grouping them.

Note that in the data the variable “temporada” is actually the field season that runs fall-summer. So “20” is the fall 2019-present, “19” is fall 2018-spring 2019, etc.

Initial Shorebird Abundance Modeling

Here is some of my initial messy exploration/attempts at modeling to look for trends/changes in shorebird abundance over time. Because we’re working with count data, I started with a Poisson distribution. From what I’ve read, I found that overdispersion is a common problem when modeling count data so I kept that in mind.

## [1] 1
## [1] 1569.193

So both should be close to 1. Not 1 or 309… Our output of 309 means there is a lot of overdispersion in the model, or that there is a lot of variation unaccountated for through the model. This is because in a poisson distribution the mean and the variance are supposed to be equal. Overdispersion means that the variance is much greater than the mean.
So next I tried to simplify the model and look just at seasonality…

## 
## Call:
## glm(formula = shore ~ szn, family = poisson(link = "log"), data = guild_totals)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -74.587  -43.110  -23.716    8.658  171.421  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.156518   0.006282 1298.36   <2e-16 ***
## szn         -0.212637   0.002826  -75.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 281954  on 112  degrees of freedom
## Residual deviance: 276346  on 111  degrees of freedom
## AIC: 277346
## 
## Number of Fisher Scoring iterations: 5
## [1] 1

Clearly, still dealing with some overdispersion problems so next I tried running a glm but this time with a quasipoisson distribution.

## 
## Call:
## glm(formula = shore ~ szn + Temp + mes, family = quasipoisson, 
##     data = guild_totals)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -67.26  -39.96  -21.33   10.09  146.10  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.31364    0.82511  12.500  < 2e-16 ***
## szn         -0.44928    0.18406  -2.441  0.01629 *  
## Temp        -0.01674    0.01156  -1.448  0.15054    
## mes         -0.08600    0.03088  -2.785  0.00633 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2844.066)
## 
##     Null deviance: 281311  on 110  degrees of freedom
## Residual deviance: 238372  on 107  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5
## [1] 2227.779

Once again, I’m stll having problems with the model being quite overdispersed. So next I tried a negative binomial model.

## 
## Call:
## glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir, 
##     data = guild_totals, link = "log", init.theta = 1.037219293)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.3761  -1.0425  -0.4420   0.2677   1.9843  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    11.51790    0.80757  14.262  < 2e-16 ***
## szn            -0.45625    0.16444  -2.775  0.00553 ** 
## Temp           -0.03536    0.01066  -3.317  0.00091 ***
## mes            -0.06486    0.02797  -2.319  0.02040 *  
## tide_heightlow -1.05641    0.32682  -3.232  0.00123 ** 
## tide_heightmid -0.30532    0.32105  -0.951  0.34161    
## tide_heightunk -0.88247    0.75515  -1.169  0.24257    
## tide_dirrising  0.18163    0.28887   0.629  0.52952    
## tide_dirslack   0.61548    0.28815   2.136  0.03268 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.0372) family taken to be 1)
## 
##     Null deviance: 168.11  on 110  degrees of freedom
## Residual deviance: 127.54  on 102  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 1912.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.037 
##           Std. Err.:  0.123 
## 
##  2 x log-likelihood:  -1892.659
## [1] 1.250395

This is looking a lot better, and actually there is no longer the very high degree of overdispersion which is great. Looking back on So lets check the residuals real quick:

Not too terrible… From these plots of the residuals, there is a bit of a pattern to them, as well as a few data points that jump out (30 and 47). Though it does raise some flags, it looks decent. But rather than being satisfied with this, I kept exploring.

Tansforming the data

A zero-inflated model seemingly is another route to go, but that wouldn’t work or make any sense at all given I don’t have any zeros in this data. Though there are varying opinions for and against it, I thought maybe I should try transforming the data. This seems like a valid thing to given the lack of zeros in this data and that seemed to be one big argument against transforming the data. I checked it out, knowing this might work with the shorbird guild. While some of the other surveys of guilds do have zeros, from my understanding there is not an amount that might call for a zero-inflated or truncated poisson model. But that might be a later conversation with a statistician.

Given the data looks a lot closer to a normal distribution, I decided to see what would happen if I tried running the models with transformed data. So I tried just a classic gaussian distribution glm.

## 
## Call:
## glm(formula = log(shore) ~ temporada + szn + mes + Temp + tide_height + 
##     tide_dir, data = guild_totals)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.0272  -0.6739   0.0511   0.7138   2.1781  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    20.67259    2.01238  10.273  < 2e-16 ***
## temporada      -0.39973    0.07894  -5.064 1.85e-06 ***
## szn            -0.51013    0.18946  -2.693   0.0083 ** 
## mes            -0.03108    0.03283  -0.947   0.3461    
## Temp           -0.08332    0.01392  -5.986 3.30e-08 ***
## tide_heightlow -0.15059    0.38954  -0.387   0.6999    
## tide_heightmid  0.29917    0.37186   0.805   0.4230    
## tide_heightunk  0.58505    0.87764   0.667   0.5065    
## tide_dirrising  0.10734    0.33373   0.322   0.7484    
## tide_dirslack  -0.15400    0.35732  -0.431   0.6674    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.273648)
## 
##     Null deviance: 211.48  on 110  degrees of freedom
## Residual deviance: 128.64  on 101  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 353.37
## 
## Number of Fisher Scoring iterations: 2
## [1] 0.9668983

This is also looking pretty decent and lke a potential route to follow. And it might be better because I’ll get lower AICc values but I’m not sure if I’ll be able to replicate this with the other guilds. And now let’s check out the residuals:

Again, two data points are jumping out of the data (37 and 57)….so gonna take a look at the data… Let’s see if the data is actually distributed normally by running a Shapiro-Wilk Normality test.

## 
##  Shapiro-Wilk normality test
## 
## data:  guild_totals$shore
## W = 0.72117, p-value = 2.363e-13

YIKES. Now lets check the log-transformed data.

## 
##  Shapiro-Wilk normality test
## 
## data:  log(guild_totals$shore)
## W = 0.96505, p-value = 0.004724

Ok so that’s also not dristributed normally. Soo with some fresh thoughts, I went to maybe trying to use monthly means.

Using Monthly Means?

I calculated the mean number of shorebirds per month for each year. This might work because there are typically two surveys per month. But one reason it might is every number will end with .5 or an interger.

## Parsed with column specification:
## cols(
##   mes.yr = col_character(),
##   shore = col_double(),
##   temp = col_double(),
##   ciclo = col_double(),
##   szn = col_double(),
##   mes = col_double(),
##   year = col_double()
## )
## [1] "mes.yr" "shore"  "temp"   "ciclo"  "szn"    "mes"    "year"

So next I ran some glms wth these means.

## 
## Call:
## glm(formula = shore ~ mes + ciclo + szn + temp, data = shoremeans)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -954.04  -216.36   -71.63   206.30  1307.65  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7128.676   1528.669   4.663 3.78e-05 ***
## mes          -16.668     20.383  -0.818 0.418612    
## ciclo       -162.784     58.600  -2.778 0.008453 ** 
## szn         -217.960    124.872  -1.745 0.088984 .  
## temp         -41.648      9.977  -4.174 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 240965.4)
## 
##     Null deviance: 14275232  on 42  degrees of freedom
## Residual deviance:  9156684  on 38  degrees of freedom
## AIC: 661.59
## 
## Number of Fisher Scoring iterations: 2

This model is looking like a pretty poor fit. My thought is that because even though now we are working with means, the data is still mostly integers. So I tried the same thing, but with a gamma distribution, mostly because it’s a different distribution.

## 
## Call:
## glm(formula = shore ~ mes + ciclo + szn + temp, family = Gamma, 
##     data = shoremeans)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.47927  -0.41518   0.01433   0.21005   1.12005  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.729e-03  1.590e-03  -3.604 0.000897 ***
## mes          1.717e-05  1.927e-05   0.891 0.378291    
## ciclo        1.700e-04  6.038e-05   2.815 0.007682 ** 
## szn          3.308e-04  1.984e-04   1.667 0.103767    
## temp         4.759e-05  1.137e-05   4.187 0.000162 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.3100687)
## 
##     Null deviance: 21.647  on 42  degrees of freedom
## Residual deviance: 15.538  on 38  degrees of freedom
## AIC: 655.36
## 
## Number of Fisher Scoring iterations: 5

Ok ok looking better but this can maybe be cleaner. So I tried transforming the means and doing it again. First I tried square-root transformation.

## 
## Call:
## glm(formula = sqrt(shore) ~ ciclo + szn + temp, data = shoremeans)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -18.1675   -4.5416    0.5746    3.1148   15.6001  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 133.3573    24.9214   5.351 4.11e-06 ***
## ciclo        -2.6959     0.9488  -2.841  0.00711 ** 
## szn          -3.4310     1.8177  -1.888  0.06654 .  
## temp         -0.7485     0.1608  -4.655 3.70e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 64.0638)
## 
##     Null deviance: 4026.6  on 42  degrees of freedom
## Residual deviance: 2498.5  on 39  degrees of freedom
## AIC: 306.71
## 
## Number of Fisher Scoring iterations: 2

This doesn’t look great- just look at the deviance. So then I tried a log transformation.

## 
## Call:
## glm(formula = log(shore) ~ ciclo + szn + temp + mes, data = shoremeans)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8739  -0.2273   0.1177   0.4170   1.0157  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.00221    2.09164   7.172 1.44e-08 ***
## ciclo       -0.20641    0.08018  -2.574   0.0141 *  
## szn         -0.35234    0.17086  -2.062   0.0461 *  
## temp        -0.05979    0.01365  -4.380 9.01e-05 ***
## mes         -0.00876    0.02789  -0.314   0.7552    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.4511314)
## 
##     Null deviance: 27.580  on 42  degrees of freedom
## Residual deviance: 17.143  on 38  degrees of freedom
## AIC: 94.485
## 
## Number of Fisher Scoring iterations: 2

Also, I went back to the begining of Abram’s “glm_esteros” code and saw the gamma distribution glm. So I tried that and got some surprising results. Will may bring that into all this later. And actually this is looking pretty decent. So lets take a quick look at the residuals.

Just from the residuals, definitely looks like there are some patterns in there- soooo maybe not the best fit, even if some of the other parts of modeling are seemingly working out better. Plus, the qq-plot isn’t too convincing.

Model Selection

Great so we’ve got some models, Maybe they aren’t the best but it’s what I have to work with so far. Now we need to select and compare them. One way is to use inforation criteria such as the AICc. But there are also some neat ways to automate it. At the end of Abram’s code, I found the function dredge() so I looked into it. Here I’m using the “shoremeans” file I just created above.

## [1] 101.0197

This new model is the same as performed above whose residuals are shown. So now that we have the model saved, we can use the dredge() function to get a model selection table. When I was initially running the function, I got some errors. With help of interwebs and help(), I realized that I needed to change the way R deals with NA values. So before using that function, make sure the code options(na.action = "na.fail") runs—or else you’ll get an errror.

## Fixed term is "(Intercept)"
## Global model call: glm(formula = log(shore) ~ ciclo + temp + szn + mes, data = shoremeans)
## ---
## Model selection table 
##    (Intrc)   ciclo      mes     szn     temp    R^2 df  logLik AICc delta
## 14  14.990 -0.2094          -0.3282 -0.06044 0.3768  5 -41.298 94.2  0.00
## 10  13.830 -0.1938                  -0.05751 0.3011  4 -43.764 96.6  2.36
## 16  15.000 -0.2064 -0.00876 -0.3523 -0.05979 0.3784  6 -41.243 96.8  2.60
## 12  13.980 -0.2019  0.01715         -0.05920 0.3089  5 -43.523 98.7  4.45
## 13   9.983                  -0.2918 -0.04173 0.2637  4 -44.884 98.8  4.60
##    weight
## 14  0.559
## 10  0.172
## 16  0.152
## 12  0.060
## 13  0.056
## Models ranked by AICc(x)

Pretty neat. Now let’s save another model that we had check earlier– the gamma distribution. Feel free to scroll up to check the output of the model.

## Global model call: glm(formula = shore ~ szn + temp + mes + ciclo, family = Gamma, 
##     data = shoremeans)
## ---
## Model selection table 
##      (Intrc)     ciclo       mes       szn      temp    R^2 df   logLik  AICc
## 10 -0.004841 0.0001664                     4.688e-05 0.2537  4 -323.017 655.1
## 14 -0.005476 0.0001683           0.0002793 4.741e-05 0.2867  5 -322.044 655.7
## 12 -0.004942 0.0001681 8.020e-06           4.721e-05 0.2565  5 -322.938 657.5
## 16 -0.005729 0.0001700 1.717e-05 0.0003308 4.759e-05 0.2987  6 -321.681 657.7
## 9  -0.001070                               3.352e-05 0.1557  3 -325.670 658.0
##    delta weight
## 10  0.00  0.393
## 14  0.62  0.288
## 12  2.41  0.118
## 16  2.61  0.107
## 9   2.87  0.094
## Models ranked by AICc(x)

The dredge() function works with various types of model. For example, it will also work with negative binomial glms, so here we’ll use the shore_nb_mod from earlier–aka one of the models whose residuals were graphed above.

## 
## Call:  glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir, 
##     data = guild_totals, link = "log", init.theta = 1.037219293)
## 
## Coefficients:
##    (Intercept)             szn            Temp             mes  tide_heightlow  
##       11.51790        -0.45625        -0.03536        -0.06486        -1.05641  
## tide_heightmid  tide_heightunk  tide_dirrising   tide_dirslack  
##       -0.30532        -0.88247         0.18163         0.61548  
## 
## Degrees of Freedom: 110 Total (i.e. Null);  102 Residual
##   (2 observations deleted due to missingness)
## Null Deviance:       168.1 
## Residual Deviance: 127.5     AIC: 1913
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #0 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #1 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #2 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #3 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 4 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 5 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 6 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 7 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #8 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #9 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #10 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #11 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 12 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 13 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 14 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 15 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #16 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #17 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #18 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #19 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 20 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 21 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 22 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 23 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #24 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #25 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #26 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #27 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 28 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 29 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 30 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 31 skipped)
## Global model call: glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir, 
##     data = guild_totals, link = "log", init.theta = 1.037219293)
## ---
## Model selection table 
##    (Int)      mes     szn tid_dir tid_hgh    R^2 df   logLik   AICc delta
## 20 9.592 -0.07122 -0.5178               + 0.1936  7 -969.127 1953.3     0
## 28 9.354 -0.07100 -0.5486       +       + 0.2125  9 -967.784 1955.3     2
##    weight
## 20  0.731
## 28  0.269
## Models ranked by AICc(x)
## 
## Call:
## model.avg(object = mod_nb_dredge, subset = delta < 4)
## 
## Component model call: 
## glm.nb(formula = shore ~ <2 unique rhs>, data = guild_totals, link = 
##      log, init.theta = 1.037219293)
## 
## Component models: 
##      df  logLik    AICc delta weight
## 124   7 -969.13 1953.32     0   0.73
## 1234  9 -967.78 1955.32     2   0.27
## 
## Term codes: 
##         mes         szn    tide_dir tide_height 
##           1           2           3           4 
## 
## Model-averaged coefficients:  
## (full average) 
##                 Estimate Std. Error Adjusted SE z value Pr(>|z|)    
## (Intercept)     9.527770   0.523135    0.528913  18.014  < 2e-16 ***
## mes            -0.071158   0.028167    0.028491   2.498  0.01251 *  
## szn            -0.526081   0.166354    0.168253   3.127  0.00177 ** 
## tide_heightlow -0.947298   0.331006    0.334728   2.830  0.00465 ** 
## tide_heightmid -0.249012   0.315010    0.318494   0.782  0.43431    
## tide_heightunk -0.783498   0.662821    0.670439   1.169  0.24255    
## tide_dirrising  0.009135   0.148663    0.150378   0.061  0.95156    
## tide_dirslack   0.094990   0.212831    0.213975   0.444  0.65709    
##  
## (conditional average) 
##                Estimate Std. Error Adjusted SE z value Pr(>|z|)    
## (Intercept)     9.52777    0.52313     0.52891  18.014  < 2e-16 ***
## mes            -0.07116    0.02817     0.02849   2.498  0.01251 *  
## szn            -0.52608    0.16635     0.16825   3.127  0.00177 ** 
## tide_heightlow -0.94730    0.33101     0.33473   2.830  0.00465 ** 
## tide_heightmid -0.24901    0.31501     0.31849   0.782  0.43431    
## tide_heightunk -0.78350    0.66282     0.67044   1.169  0.24255    
## tide_dirrising  0.03391    0.28495     0.28828   0.118  0.90636    
## tide_dirslack   0.35261    0.27804     0.28128   1.254  0.20999    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the code above, I turned off the warning messages as many warning messages as I was finding out that negative binomal models can be a bit tricky to work with. Ok this was cool and some good practice in modeling, but I still hadn’t necessarily found the best way to model my data that stays true to the biology. And thus I entered…

The World of GLMERs

I started playing with the idea of mixed models because they are a common tool used in biological modeling. Given the data we’re wworking with and question we’re trying to answer, the random effect that needs to be taken into account with this modeling is the observer(s). With several years of data and different people counting birds over the years, that’s probably one of the biggest sources of variation in the data—and hence random effect of the observer(s)!!

I removed warnings and messages from above to save some space. But what I can tell you is there are pretty trash models, regardless of whether or not the cycle is a read as a factor or a continuous vaariable. I was testing the model in this way because I was unsure of how year should be read. I’ve determined that in this, I’m not trying to compare each year to one another but rather look for an overall trend over time.

Singluarity, convergence, and just lots of problems and headaches that can be avoided. What I’ve done to troubleshoot is going into some different directions. Just for shits and giggles though, let’s go ahead and run a quick line to see how the automated model selection might pick the best model.

## Global model call: glmer(formula = shore ~ ciclo + szn + tide_height + tide_dir + 
##     mes + (1 | obs), data = guild_totals, family = "poisson")
## ---
## Model selection table 
##    (Int) ccl      mes     szn tid_dir tid_hgh df   logLik     AICc delta weight
## 32 10.27   + -0.06983 -0.6141       +       + 15 -51473.9 102982.7     0      1
## Models ranked by AICc(x) 
## Random terms (all models): 
## '1 | obs'

This is still a bit of a headache. But thats ok. Just as a refresher so far I’ve tried GLMs of all sorts (poisson, quasipoisson, negative binomials, log-transformed gaussian) and now we’ve added a poisson GLMER. Since working with the data thus far has seemed to best fit the negative binomial distribution, lets used that in our mixed model.

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.3551)  ( log )
## Formula: shore ~ temporada + season + (1 | obs)
##    Data: guild_totals
## 
##      AIC      BIC   logLik deviance df.resid 
##   1929.1   1945.4   -958.5   1917.1      107 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.1564 -0.6659 -0.1840  0.3693  3.5382 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2997   0.5474  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 10.73328    1.25467   8.555  < 2e-16 ***
## temporada   -0.22363    0.07289  -3.068  0.00215 ** 
## season2      0.87871    0.27830   3.157  0.00159 ** 
## season3     -0.31475    0.29344  -1.073  0.28343    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd seasn2
## temporada -0.978              
## season2   -0.206  0.041       
## season3   -0.275  0.115  0.735
## $residDev
## [1] 90.17925
## 
## $residDF
## [1] 108
## 
## $ratio
## [1] 0.8349931
## 
## $P.ChiSq
## [1] 0.8925984

Alright- it looks like we aren’t dealing with an absurd amount of overdispersion that was previously a problem.

Combining Migration Seaons??

I have only begun looking into the world of testing models and hypotheses. One method I encountered to test general hypotheses was an example I found on the internet proved useful. Essentially what I was interested in testing was whether or not there are differences between the seasons. This sets up a pairwise comparison between factors using a Tukey test.

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glmer(formula = shore ~ temporada + season + (1 | obs), data = guild_totals, 
##     family = MASS::negative.binomial(theta = 1.35510116706339))
## 
## Linear Hypotheses:
##            Estimate Std. Error z value Pr(>|z|)    
## 2 - 1 == 0   0.8787     0.2783   3.157  0.00439 ** 
## 3 - 1 == 0  -0.3148     0.2934  -1.073  0.52654    
## 3 - 2 == 0  -1.1935     0.2088  -5.717  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

This is pretty neat. It shows which factors are siginificantly different from one another and which are similar enough to be the same. Since there are only 3 factors in the variable of season, the output is pretty small and easy to read. But imagine if there were 10 different factors of the variable? Could get pretty messy pretty quickly—so another way to look at each factor is to use letters. This is easier to follow and thus more accessible to interpret determine whether factors were similar or significantly different.

##   1   2   3 
## "b" "a" "b"

So what all that means is that spring and fall are similar to one another (hence the same letter) but winter is different. This makes sense given there is an influx of migrants in the winter and birds similarly moving through or completely absent in the fall and spring. In the OG data, 1=fall, 2=winter, 3=spring. Below I’ll change the data so that fall/spring=1 and keep 2=winter using a simple ifelse() command.

Now that the data is combined, let’s take a new look at it. Here I used more or less the same code as the boxplots at the beginning of the document.

And just a refresher/overview, let’s look at the data, but rather than being divided by season, it is only divided by the work cycle/year (“temporada”).

Final Model Selection and Interpretation

Since I haven’t been able to get the automated model selection dredge() to work with the negative binomial glmer, there are some other ways to do what I want to do and compare the models. It’s not quite as pretty but as far as I understand model selection, it makes sense. First we’ll make a null model and go through some steps to change/add variables to our models.

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.3527)  ( log )
## Formula: shore ~ temporada + season + (1 | obs)
##    Data: guild_totals
## 
##      AIC      BIC   logLik deviance df.resid 
##   1928.2   1941.9   -959.1   1918.2      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.1532 -0.6477 -0.2305  0.3518  3.0888 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.3156   0.5618  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 10.35574    1.23492   8.386  < 2e-16 ***
## temporada   -0.21397    0.07405  -2.890  0.00386 ** 
## season2      1.09239    0.18936   5.769 7.98e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.991       
## season2   -0.017 -0.053

Just from an intial look at this model summary, it looks like there is a significant decline in the abundance of shorebirds over time. It also appears, as expected, that there is a significant increase in in the winter months when compared to the rest of the year. Before we saying anything more, let’s do a handful of other steps to check the model and make sure this will be our final model.

## $residDev
## [1] 87.99499
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.8072935
## 
## $P.ChiSq
## [1] 0.9305589
## Data: guild_totals
## Models:
## null.shore: shore ~ 1 + (1 | obs)
## glmernb.shore1: shore ~ temporada + season + (1 | obs)
##                npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## null.shore        3 1958.4 1966.6 -976.19   1952.4                         
## glmernb.shore1    5 1928.2 1941.9 -959.12   1918.2 34.148  2  3.845e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since there is a significant difference, this means we can reject the null model and accept the simple negative binomial mixed model as a betterr fit to the data. I also feel OK about that given the AICc/BIC are slightly lower, and the log-Liklihood has moved a bit closer to 0 (which is another one of many things to look for).

Selecting a Model

There are a wide variety of criteria and methods that are used to select models. Here, I go through a few different Now let’s add in some other potential variables that would affect the presence of birds, in this case, the tide!

We can make a neat model selection table based on information selection criteria. Here we will use the second order AIC (AICc) for comparison. For the model names, I abbreviated the names into “modX” so note that “mod1” = “glmernb.shore1”, “mod2” = “glmernb.shore2”, etc…

## 
## Model selection based on AICc:
## 
##       K    AICc Delta_AICc AICcWt Cum.Wt      LL
## mod1  5 1928.79       0.00   0.40   0.40 -959.12
## mod3  7 1929.16       0.37   0.33   0.74 -957.05
## mod2  8 1930.45       1.65   0.18   0.91 -956.53
## mod4 10 1931.81       3.02   0.09   1.00 -954.83
## null  3 1958.60      29.81   0.00   1.00 -976.19

Let’s do another check by comparing increasingly complex models using ANOVAs. If the Chi-squared value is significant, that means that the model is a better fit. We will set up comparisons between our simplest model (glmernb.shore1) and each of the increasingly complex models. While we can use the code to set up single comparisons between models (ie. anova(mod1, mod2)) we can also combine it into one line of code such as anova(mod1, mod2, mod3, ...) and we will get a table that compares our simplest model (mod1) with the rest of the models.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
glmernb.shore1 5 1928.231 1941.868 -959.1156 1918.231 NA NA NA
glmernb.shore3 7 1928.096 1947.187 -957.0479 1914.096 4.135420 2 0.1264751
glmernb.shore2 8 1929.062 1950.881 -956.5311 1913.062 1.033537 1 0.3093293
glmernb.shore4 10 1929.656 1956.930 -954.8280 1909.656 3.406285 2 0.1821103

What each of these show, is that with adding in the height and/or direction of the tide, we do not get a better fitting model. That means we should stick to the first (and most simple!) model as is the best fit.

Another potential approach to model selection would be to average the models. But rather than follow more coomplicated steps to do that, I think it’s OK to stick with this process of selecting the best model

Calculating R^2

Now that we have selected a model, one interesting piece of information about the model is checking the R^2 value. While this measure is commonly used in lms and glms it can be applied to mixed models. Here the function will calculate the R^2 values taking into account only the fixed effects (marginal) as well as all effects (conditionall) using 3 different methods. Here’s the explanataion taken from the help("r.squaredGLMM"):

Marginal R_GLMM² represents the variance explained by the fixed effects, and is defined as: \[R_GLMM(m)² = (σ_f²) / (σ_f² + σ_α² + σ_ε²)\] Conditional R_GLMM² is interpreted as a variance explained by the entire model, including both fixed and random effects, and is calculated according to the equation: \[R_GLMM(c)² = (σ_f² + σ_α²) / (σ_f² + σ_α² + σ_ε²)\] where σ_f² is the variance of the fixed effect components, σ_α² is the variance of the random effects, and σ_ε² is the “observation-level” variance.

So with that we can make a table of these values:

##                 R2m       R2c
## delta     0.2803543 0.4955452
## lognormal 0.3210871 0.5675431
## trigamma  0.2281494 0.4032695

From the this, it looks like our model does a pretty decent job of fitting the data and explaining the variance. Given we’re modeling change over time and not taking into account other environmental variables that are likely affecting the presence/absence/abundance of birds, I feel okay about believing this model.

Check Residuals

We’ll run more or less the same code as we used above. See the .rmd file for code. Recall that the difference now is that

Again, it looks more or less the same as before. Our residuals aren’t perfectly randomly distributed but I’m not sure what I could do at this point to get a better fit. Given the process to get to this model, I think it’s acceptable for the sake of this analysis.

Plotting Model Predictions

Now it’s time to look at our model predictions. One of the most important aspects of a regression are the estimates of each variable in the model. Since we’re working with negative binomial models, there is a log link meaning the effect estimates need to be log transformed. We can do that easily by using R as a fancy calculator. Before we do that, let’s take a quick look at the different estimates of the model.

The first thing you might notice is that the intercept is very high- yes but it’s not as important. If we think about the data we’re modeling, there are counts of shorebirds in the thousands, and in the first year of data (2013-2014 season), there is a very high degree of variability.
What we really want to pay more attention to are the estimates of two fixed effects of the model. Just by looking at this grarph, the effect of season is a positive increase while that of temporada(which is the year/cycle) is negative.
If you recall from the summary output of the model the estimate of the variable temporada was: -0.2139745. To translate that into normal terms, as the year/cycle changes by 1, the change in shorebird abundance is multipled by a factor of 0.8073726. This can be interpreted into that “the abundance of shorebirds is declining by 19.2627394% each year.” One way we can visualize this is by plotting the predicted values on a graph. We can plot the overall trend of the model. Note that this will back-transform the predicted values to the original scale of our response variable.

However, we know there are differences between the season and that there was a significant effect in the model. From the model estimate of that variable, we can say winter season sees an increase of 298.1387504% in the number of shorebirds in comparison to the rest of the year. Knowing that the difference is so great, let’s take a look at the same graph as above, but grouped by season. Recall that the seasons are split into 1: Fall/Sping and 2: Winter.

Note that both graphs above are the same data, just plotted differently.

In both graphs, the seasons are split into 1 (Fall/Sping–the migration seasons) in purple and 2 (Winter) in blue. These colors will be consistent throughout the rest of this document. This shows that while shorebird abundance is declining in both seasons, it’s clear that the decline is more pronounced in the winter-we can visually see this as the slope of the line is steeper. This could potentially be due a decline in the migratory species overwintering, so it would be a good idea to look further into the migratory species.

Plot the random effects. And also a diagnostic plot: > For generalized linear mixed models, returns the QQ-plot for random effects.

So diagnostic plot is essentially just a qq plot

## $obs
## `geom_smooth()` using formula 'y ~ x'

The first plot shows the random effects. The red line indicates no effect. The blue indicates a positive effect while purple is negtative. The second plot above, as mentioned, is a qqplot.

## Warning: Ignoring unknown parameters: width, conf.int

Convergence Problem Troubleshooting

Both during my model exploration with shorebirds as well as with other guilds, I ran into a handful of problems. One common problem I ran into was a failure for the model to converge. This convergence failure essentially means that the model fits the data extremely poorly, or the observations just don’t match up. But this isn’t the end of the world. For example, when I tried to apply a negative binomial mixed model to the waterfowl guild data, I ran into this problem. Below is an example of how I handled it.

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00631856 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.5939)  ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
##    Data: guild_totals
## 
##      AIC      BIC   logLik deviance df.resid 
##   1158.7   1172.4   -574.4   1148.7      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7671 -0.6636 -0.3612  0.3186  3.0002 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2323   0.482   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.54422    1.55325   6.145 8.01e-10 ***
## temporada   -0.36895    0.09658  -3.820 0.000133 ***
## season2      1.29676    0.30908   4.195 2.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.987       
## season2    0.219 -0.318

Luckily, I found this neat guide that I followed and found helpful. Below I follow the steps of the aformentioned guide and fix the problem. What the steps below basically do is try to optimize the model, starting from the inital fit but increasing the number of iterations.

## [1] 0.4820253
## [1] 8.6121e-06
## [1] 8.6121e-06

No more failure to converge! So let’s check for overdispersion and look at the summary output of the model.

## $residDev
## [1] 81.82569
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.7506944
## 
## $P.ChiSq
## [1] 0.9758231
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.5939)  ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
##    Data: guild_totals
## Control: glmerControl(optCtrl = list(maxfun = 20000))
## 
##      AIC      BIC   logLik deviance df.resid 
##   1158.7   1172.4   -574.4   1148.7      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7671 -0.6636 -0.3613  0.3186  3.0001 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2323   0.482   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.54401    0.55262  17.270  < 2e-16 ***
## temporada   -0.36894    0.03602 -10.242  < 2e-16 ***
## season2      1.29669    0.30533   4.247 2.17e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.888       
## season2   -0.019 -0.281

It looks like there is a significcant decline in the abundance of waterfowl over time. It also appears, as expected, that there is a significant increase in in the winter months in comparison to the rest of the year.

Final Results of Other Guilds

Below, I present mostly just the numbers from the other guilds.

Waterfowl

Since we just worked on the waterfowl model above, let’s start there and begin by plottin the data.

Recall our model glmernb.fowl2 from directly above.
Similar to before, we can once again calculate the R^2 values. This time we’ll show it in neat table.

R2m R2c
delta 0.2931394 0.3782012
lognormal 0.3952562 0.5099498
trigamma 0.1673828 0.2159532

Once again, it’s looking like we’ve got a pretty decent looking model.

Let’s check the residuals

And now let’s look at the model predctions. Using the estimate of our model, we can say there has been a decline of 30.8533103% over the seasons and an increase of 365.7134803% in the winter.

Similar to the shorebirds, we can see a similar difference in that the decline seems much more stark in the winter.

Gulls/Terns/Skimmers

Check the data

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.1854)  ( log )
## Formula: gulls_terns ~ temporada + season + (1 | obs)
##    Data: guild_totals
## 
##      AIC      BIC   logLik deviance df.resid 
##   1805.0   1818.6   -897.5   1795.0      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0678 -0.7161 -0.2335  0.6357  4.2453 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.08125  0.285   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  7.34682    0.98663   7.446 9.59e-14 ***
## temporada   -0.04662    0.05946  -0.784  0.43309    
## season2      0.57935    0.19179   3.021  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.988       
## season2    0.028 -0.105
## $residDev
## [1] 94.99911
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.8715514
## 
## $P.ChiSq
## [1] 0.8280607

Once again, it looks likee there is a significant decline in the abundance of gulls/terems/skimmers over time as well as a significant increease in the winter due to the influx of migrants. From the model estimates, there is a decline of 4.554998% of the in gulls/terns/skimmers guild. While this isn’t as exaggerated as the other guilds, it is still something worth noting.

Calculate R^2 values

R2m R2c
delta 0.0851377 0.1654172
lognormal 0.1105094 0.2147128
trigamma 0.0590585 0.1147470

Review the residuals

Plot the model predictions

Here we can visualize that the decline is not as steep as the other guilds, but still present. We can also once again see the notable differences between the winter and non-winter seasons.

Pelicans/Cormorants/Allies

Plot the data

Making the model:

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.4891)  ( log )
## Formula: pel_corm ~ temporada + season + (1 | obs)
##    Data: guild_totals
## Control: glmerControl(optCtrl = list(maxfun = 20000))
## 
##      AIC      BIC   logLik deviance df.resid 
##   1433.4   1447.0   -711.7   1423.4      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.1437 -0.6468 -0.3666  0.2948  4.0194 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.4335   0.6584  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.42046    0.20036  42.026  < 2e-16 ***
## temporada   -0.22764    0.01202 -18.942  < 2e-16 ***
## season2      0.87525    0.20483   4.273 1.93e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.510       
## season2   -0.282 -0.280
## $residDev
## [1] 114.2199
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 1.047889
## 
## $P.ChiSq
## [1] 0.3471449

Pelicans/cormorants/allies guild is declining by 20.3589084% each year.

Calculate R^2 values

R2m R2c
delta 0.2305754 0.5309768
lognormal 0.2593109 0.5971499
trigamma 0.1934467 0.4454756

Plot model predictions Something to note here is the axis. If you look at the boxplots above, you’ll see there are two very high counts (ploted as dots aka outliers) in the 2013-2014 season. Now that can very obviously be skewing the data, but take a second to look at the axis of our graphed model predictions and you’ll see that it’s much smaller.

Long-legged Waders

The only guild that lacks a final model

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00645303 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(2.6765)  ( log )
## Formula: waders ~ temporada + season + (1 | obs)
##    Data: guild_totals
## 
##      AIC      BIC   logLik deviance df.resid 
##   1175.5   1189.1   -582.8   1165.5      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3878 -0.6319 -0.1972  0.4254  3.2289 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2503   0.5003  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.02105    1.05489   5.708 1.14e-08 ***
## temporada   -0.11036    0.06322  -1.746   0.0809 .  
## season2     -0.06425    0.14351  -0.448   0.6543    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.993       
## season2   -0.038 -0.018
## $residDev
## [1] 95.30029
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.8743146
## 
## $P.ChiSq
## [1] 0.822308

As you can see, this model failed to onverge and I haven’t worked on getting a better fit yet

Final Table of All Models

We can put a summary of all the important parts of each model into a single table. Note that here the “estimates” of each variable have been back transformed. For example, in the shorebird model, rather than display the estimate of “-0.21397” for temporada (aka the year), the table will show the back transformed intercept (recall that for a negative binomial model we have to use the exp() function) so the number displayed is the incidence rate ratio of 0.8073726. This simply means that as that variable increases by 1 unit, the response variable is multipled by a factor of 0.8073726. Also note that the R^2 values displayed are calculated using the log-normal method that was shown in the tables above.

Shorebirds Waterfowl Gulls/Terns/Skimmers Pelicans/Cormorants/Allies
Predictors Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value
Intercept 31436.98 2794.32 – 353676.27 <0.001 13960.80 4726.26 – 41238.53 <0.001 1551.26 224.32 – 10727.61 <0.001 4538.97 3064.83 – 6722.15 <0.001
Year 0.81 0.70 – 0.93 0.004 0.69 0.64 – 0.74 <0.001 0.95 0.85 – 1.07 0.433 0.80 0.78 – 0.82 <0.001
Season 2.98 2.06 – 4.32 <0.001 3.66 2.01 – 6.65 <0.001 1.78 1.23 – 2.60 0.003 2.40 1.61 – 3.58 <0.001
Random Effects
σ2 0.55 0.99 0.61 0.52
τ00 0.32 obs 0.23 obs 0.08 obs 0.43 obs
ICC 0.36 0.19 0.12 0.46
N 42 obs 42 obs 42 obs 42 obs
Observations 113 113 113 113
Marginal R2 / Conditional R2 0.321 / 0.568 0.395 / 0.510 0.111 / 0.215 0.259 / 0.597

Migrants and Residents?

Another important grouping to make in the bird is between the migrantory and resident species for many reasons. One interesting reason just based off the guild muilds is that when looking at the plotted model predictions, it seemed that in general, the decrease in birds over time was more pronounced in the winter season.

##  [1] "Date"        "migrants"    "residents"   "Cloud"       "Temp"       
##  [6] "Precip"      "Wind"        "temporada"   "szn"         "obs"        
## [11] "tide_height" "tide_dir"    "mes"         "mes.yr"      "season"

Migrants

Start with migrants. Plot data:

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.9745)  ( log )
## Formula: migrants ~ temporada + season + (1 | obs)
##    Data: mig
## 
##      AIC      BIC   logLik deviance df.resid 
##   1807.2   1820.8   -898.6   1797.2      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3566 -0.5788 -0.1004  0.4177  2.9619 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2397   0.4896  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  7.90685    1.09963   7.190 6.46e-13 ***
## temporada   -0.13459    0.06464  -2.082   0.0373 *  
## season       0.78086    0.15729   4.965 6.89e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.973       
## season    -0.192 -0.013

12.5925791% decrease over the years 218.3349139% higher in the winter

Plot the residuals

Calculate R^2 values

R2m R2c
delta 0.2065678 0.4611181
lognormal 0.2302980 0.5140908
trigamma 0.1782550 0.3979160

Plot model predictions

First we’ll look at the effects of year on the model, with the other effects considereed constants.

And now compare that graph to the model, but dividing out the season as we

Residents

Repeat workflow Plot data:

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(2.2867)  ( log )
## Formula: residents ~ temporada + season + (1 | obs)
##    Data: mig
## 
##      AIC      BIC   logLik deviance df.resid 
##   1669.8   1683.4   -829.9   1659.8      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3347 -0.6801 -0.2256  0.4741  3.0598 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.3366   0.5802  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.94511    1.20565   4.931 8.18e-07 ***
## temporada   -0.05652    0.07153  -0.790    0.429    
## season       0.82429    0.15936   5.173 2.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.978       
## season    -0.152 -0.034

Residents, although decreasing it’s not significant. While they aren’t necessarily increasing either, this means that the birds are probably fluctuating year to year but overall are not changing. This is good!! Also, similar to all other groups, the numbers of residents increases 228.0261205% in the winter months.

Plot the residuals

Calculate R^2 values

R2m R2c
delta 0.1814263 0.5366354
lognormal 0.1970367 0.5828090
trigamma 0.1625537 0.4808129

Plot model predictions As we can see in the model prediction graphs above- our confidence intervals are very large. And though the trend is a decrease over time, there is not a lot to support that statement. So in looking at these graphs, we can see how much overlap there is in the points and confidence intervals.

Final Table of Mig Models

Repeating what we did, we can create a model summary table for our two. And once again, what were the intercepts of the model have been back transformed into incidence rate ratios.

Migrants Residents
Predictors Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value
Intercept 2715.83 314.70 – 23437.40 <0.001 381.88 35.95 – 4056.77 <0.001
Year 0.87 0.77 – 0.99 0.037 0.95 0.82 – 1.09 0.429
Season 2.18 1.60 – 2.97 <0.001 2.28 1.67 – 3.12 <0.001
Random Effects
σ2 0.41 0.36
τ00 0.24 obs 0.34 obs
ICC 0.37 0.48
N 42 obs 42 obs
Observations 113 113
Marginal R2 / Conditional R2 0.230 / 0.514 0.197 / 0.583

This is a clean way to summarize some stuff.

Some Thoughts

Given that most of the guilds seem to be trending downward. Let’s take a look once again at the predicted values of the models for each guild. In each of the models, the starting point was much higher in the winter and seemingly decreased more notably.

One reason I had moved from working with biodiversity indices to working on this modeling project was because it was clear that important aspect of the survey data was missing. For the calculating the biodiversity indices for example, birds had to be identified to species. This completely glosses over the impact of larger aggregations of birds that are identified to a group but not necessarily species (ie estimations of “1000 peeps”). In separting the birds into guilds, this important aspect of data that can be captured.

So in looking at our model, it doesn’t seem as if residents are changing but migrants are. This might suggest that the overall declines across the bird guilds are unequally attributed to declines in migrants overwintering. This could be a big jump to make, but it’s something that seemingly makes sense. BUT ALSO with the separation between migrant/resident spp, I did not include these unidentified birds. Despite this, we still saw some interesting changes and that migrants are following these trends to be declinging.

But finally, a big thing to keep in mind is that there is a lot of variability in all our data over time so all results should be taken with a grain of salt.

Future Directions

While this was fun, the work isn’t done. For example, there are several other cool questions that can be answered. Part of the problem is the surveys lack a consistent protocol. For example, I do now know how people in 2015 measured the tide or temperature. Gathering better data on historic weather/climate conditions would be interesting to look at changes or potential correlations between temp and bird relative abundances. Though we take data on the temperature during each monitoring effort, examining climate from a wider lens would be interesting. Precipitation patterns or extreme temperature events could potentially be affecting the phenology of migration in some birds that may be using temp/precip patterns as environmental cues. While some temperature and precipitation data are available here, I found the data in the area to be incomplete and lacking the last few years. Another interesting direction would be to relate the abundances of shorebirds (and/or other guilds) to tides. While we take data on the tides, it would be possible to more accurately characterize the tide.